Initial RVY support #1

arichardson · 2025-09-30T19:29:54Z

Please take a look and let me know what you think before I send this upstream.

@jrtc27 @resistor @veselypeta @davidchisnall

resistor · 2025-10-01T11:02:54Z

I'd like to pre-merge this with Cheriot before sending it upstream, if possible. Otherwise we won't be able to track upstream while I sort out all of the merge & test issues. I can start on that next week, most likely.

resistor · 2025-10-01T11:04:09Z

llvm/lib/Target/RISCV/RISCVRegisterInfo.td

+    let isConstant = true in def X0_Y : RISCVCapReg<X0, "x0", ["zero", "null"]>,
+        DwarfRegAlias<X0>;
+    let CostPerUse = [0, 1] in {
+      def X1_Y : RISCVCapReg<X1, "x1", ["ra"]>, DwarfRegAlias<X1>;


How would folks feel about supporting the "c"-prefixed names as a non-standard extension for backwards compatibility?

I definitely like this, but wasn't sure upstream would want it. Similarly adding aliases for the mnemonics downstream would make a lot of sense

I think upstream will be sympathetic to backwards compatibility so long as it does not compromise in the implementation too much. In this case it seems like it would be minimal to support?

If adding it to altnames is sufficient then I imagine that could be fine

I originally included support for the original mnemonics but that adds a lot of untested code upstream so probably better to keep that downstream?

If adding it to altnames is sufficient then I imagine that could be fine

That's what I had in mind

I originally included support for the original mnemonics but that adds a lot of untested code upstream so probably better to keep that downstream?

I don't think it matters to us too much. Our bincompat is for xcheri, which we will just have to have live in parallel with the new opcodes & mnemonics for now.

Yeah the only shared part will be the register info change. I imagine allowing x names may start triggering weird diagnostics until we ignore the "wrong-mode" load/store instructions which currently never match due to the wrong register name. Morello has a IgnoredFeatures tablegen change to ignore them in the matcher and I should probably pull that in once I get to the loads and stores. So it might make sense for you to hold off integrating this into cheriot until I have that.

resistor · 2025-10-02T11:46:14Z

llvm/lib/Target/RISCV/RISCVFeatures.td

+// fully backwards compatible with non-Y code).
+def FeatureCapMode : SubtargetFeature<"cap-mode", "IsCapMode", "true",
+                                      "Capability pointer mode">;
+def IsCapMode


We renamed this to XCheriPureCap using RISCVExtension, so that it plays nice with the rest of the RISCV extension infrastructure. That enables us to do things like have XCheriot automatically imply XCheriPureCap.

Though given the direction of the Y standard, perhaps we should invert the sense of the feature bit?

Thoughts on this part?

Hmm can't you have XCheriot imply the CapMode subtargetfeature?

It likely shouldn't have a x prefix since that is for vendor-extensions and the standard will have this.

Hmm can't you have XCheriot imply the CapMode subtargetfeature?

We could do that in code, but the generic infrastructure for RV features implying other RV features only works for things declared as standard extensions or vendor extensions, and it then enforces the naming scheme.

Trying to page it all back in now, I think the issue might have been that CapMode can't imply XCheri (or Y in the future) because extensions can imply features, but features can't imply extensions.

Ah I was not aware that upstream introduced this restriction. I'm pretty sure you used to be able to imply certain non-extension features like CapMode.

jrtc27 · 2025-10-09T17:25:41Z

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

arichardson · 2025-10-10T00:16:30Z

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

We still need a feature bit to select I vs Y as the base ISA, this does not exist right now and is covered by capmode. Or are you suggesting we introduce something like BaseIsaI BaseIsaY features instead of the capmode ones?

arichardson · 2025-10-10T00:17:38Z

Right now plain loads have no predicates so we would need to add one?

resistor · 2025-10-10T00:18:56Z

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

jrtc27 · 2025-10-10T00:47:46Z

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

We still need a feature bit to select I vs Y as the base ISA, this does not exist right now and is covered by capmode. Or are you suggesting we introduce something like BaseIsaI BaseIsaY features instead of the capmode ones?

i and e are already feature strings (for FeatureStdExtI and FeatureStdExtE). With y (FeatureStdExtY) you can then use those for predicates.

jrtc27 · 2025-10-10T00:48:39Z

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

Then you just add your base ISA to the list of ones for the capability mode predicate, like I and E are both part of the integer mode predicate.

jrtc27 · 2025-10-10T00:49:08Z

Right now plain loads have no predicates so we would need to add one?

Yes, but that's true regardless of whether the predicate is cap-mode or i-or-e.

resistor · 2025-10-10T00:53:05Z

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

Then you just add your base ISA to the list of ones for the capability mode predicate, like I and E are both part of the integer mode predicate.

Ah, I didn't realize you were suggesting not having a feature but still retaining a tblgen predicate. I think that would work.

arichardson · 2025-10-10T06:07:32Z

Right now plain loads have no predicates so we would need to add one?

Yes, but that's true regardless of whether the predicate is cap-mode or i-or-e.

Yeah I think I misread you statement and we are all in agreement. We just need to figure out what the user-visible API to select the features should be.

Given we have the following four classes of instructions:

Instruction	Available Base=I/E	Available Base=Y	Ext	Current Extra Predicate
MUL	✔️	✔️	StdExtI	-
LW	✔️	❌	StdExtI	NotCapMode
LW_Y	❌	✔️	StdExtY	IsCapMode
YBASER	✔️	✔️	StdExtY	-

we will need the extra predicates for the mode-dependent instructions.
Do you suggest we use something like BaseIsaY instead of the existing capmode name?

XCheriot should be able to do something like

def FeatureVendorXCheriot
    : RISCVExtension<1, 0, "CHERIoT extension", [FeatureStdExtE, FeatureStdExtY, BaseIsaY]>;

resistor · 2025-10-10T12:43:05Z

I think the proposal was to treat XCheriot as its own base ISA, and define IsCapMode as explicitly checking for IsStdExtY || IsXCheriot

arichardson · 2025-10-10T16:23:44Z

I think the proposal was to treat XCheriot as its own base ISA, and define IsCapMode as explicitly checking for IsStdExtY || IsXCheriot

Oh I misread the previous ones, you are suggesting having both FeatureStdExtY and IsStdExtY? We still need to handle instructions like ybaser that work for both Y and I mode.

This is the first commit in a series of changes to add initial MC-layer support for the upcoming Y extension for CHERI. Specification: https://riscv.github.io/riscv-cheri/ Co-authored-by: Jessica Clarke <[email protected]>

This adds initial features for the base RVY extension, other extensions such as the hybrid mode will be added later. Co-authored-by: Jessica Clarke <[email protected]> Co-authored-by: Alexander Richardson <[email protected]> Co-authored-by: Petr Vesely <[email protected]>

This adds MC-level support for most of the base Y extension instructions, restricted to the execution-mode-independent subset. The Y extension (CHERI for RISC-V) also introduces an execution mode that determines whether certain register operands use the full extended register or only the address subset (the current XLEN registers). The instructions that depend on execution mode (loads/stores/jumps + AUIPC) will be added in the next commit in this stack of changes. Co-authored-by: Jessica Clarke <[email protected]> Co-authored-by: Alexander Richardson <[email protected]> Co-authored-by: Petr Vesely <[email protected]>

This adds supports for all new RVY loads/stores (capability-wide versions: ly/sy instructions). Additionally, for RVY (CHERI), loads and stores are mode-dependent, using either a YLEN register or a XLEN register as the base. In the former case loads/stores are authorized by that register, and in the latter (compatibility mode), the loads/stores keep using an address but are authorized by the DDC CSR. The assembler mnemonics are the same in both cases. Prior to the standardization process CHERI assembly used c-prefixed register names for capabilities, so we had the following syntax: lw x4, 0(c3) # capability mode: use new `CLW` instruction lw x4, 0(x3) # integer mode: use existing `LW` instruction During the standardization this was changed to keep the same register name in both modes, so now we have `lw x4, 0(x3)` in both modes. This allows using the RegClassByHwMode feature to reuse the same MC instruction but with a different operand type depending on the HwMode. The downstream fork had duplicated definitions which meant a lot of switch statements now needed to handle both MCInsts. This approach using HwMode should be much more maintainable and only introduces a minor diff compared to what we had downstream. This will also make it much easier adding support for RVY versions of other extensions such as vector, since we just need to change out the `GPRMem` operand with `PtrMem`.

…rser This ensures the broken Asmparser expansions trigger an assertion error. Noticed this while adding Y extension support where expanding pseudos generated instructions that were failing predicates but unlike instructions generated during codegen the predicate verification function was not being called here.

…Class Also fix the missing space in the error message. I notice while changing RISC-V's loads and stores to use RegClassByHwMode and got a non-descriptive error when tablegen was parsing the InstAliases.

## Summary Fix `FindProcesses` to respect Android's `hidepid=2` security model and enable name matching for Android apps. ## Problem 1. Called `adb shell pidof` or `adb shell ps` directly, bypassing Android's process visibility restrictions 2. Name matching failed for Android apps - searched for `com.example.myapp` but GDB Remote Protocol reports `app_process64` Android apps fork from Zygote, so `/proc/PID/exe` points to `app_process64` for all apps. The actual package name is only in `/proc/PID/cmdline`. The previous implementation applied name filters without supplementing with cmdline, so searches failed. ## Fix - Delegate to lldb-server via GDB Remote Protocol (respects `hidepid=2`) - Get all visible processes, supplement zygote/app_process entries with cmdline, then apply name matching - Only fetch cmdline for zygote apps (performance), parallelize with `xargs -P 8` - Remove redundant code (GDB Remote Protocol already provides GID/arch) ## Test Results ### Before this fix: ``` (lldb) platform process list error: no processes were found on the "remote-android" platform (lldb) platform process list -n com.example.hellojni 1 matching process was found on "remote-android" PID PARENT USER TRIPLE NAME ====== ====== ========== ============================== ============================ 5276 359 u0_a192 com.example.hellojni ^^^^^^^^ Missing triple! ``` ### After this fix: ``` (lldb) platform process list PID PARENT USER TRIPLE NAME ====== ====== ========== ============================== ============================ 1 0 root aarch64-unknown-linux-android init 2 0 root [kthreadd] 359 1 system aarch64-unknown-linux-android app_process64 5276 359 u0_a192 aarch64-unknown-linux-android com.example.hellojni 5357 5355 u0_a192 aarch64-unknown-linux-android sh 5377 5370 u0_a192 aarch64-unknown-linux-android lldb-server ^^^^^^^^ User-space processes now have triples! (lldb) platform process list -n com.example.hellojni 1 matching process was found on "remote-android" PID PARENT USER TRIPLE NAME ====== ====== ========== ============================== ============================ 5276 359 u0_a192 aarch64-unknown-linux-android com.example.hellojni (lldb) process attach -n com.example.hellojni Process 5276 stopped * thread #1, name = 'example.hellojni', stop reason = signal SIGSTOP ``` ## Test Plan With an Android device/emulator connected: 1. Start lldb-server on device: ```bash adb push lldb-server /data/local/tmp/ adb shell chmod +x /data/local/tmp/lldb-server adb shell /data/local/tmp/lldb-server platform --listen 127.0.0.1:9500 --server ``` 2. Connect from LLDB: ``` (lldb) platform select remote-android (lldb) platform connect connect://127.0.0.1:9500 (lldb) platform process list ``` 3. Verify: - `platform process list` returns all processes with triple information - `platform process list -n com.example.app` finds Android apps by package name - `process attach -n com.example.app` successfully attaches to Android apps ## Impact Restores `platform process list` on Android with architecture information and package name lookup. All name matching modes now work correctly. Fixes llvm#164192

…am (llvm#167724) This got exposed by `09262656f32ab3f2e1d82e5342ba37eecac52522`. The underlying stream of `m_os` is referenced by the `TextDiagnostic` member of `TextDiagnosticPrinter`. It got turned into a `llvm::formatted_raw_ostream` in the commit above. When `~TextDiagnosticPrinter` (and thus `~TextDiagnostic`) is invoked, we now call `~formatted_raw_ostream`, which tries to access the underlying stream. But `m_os` was already deleted because it is earlier in the order of destruction in `TextDiagnosticPrinter`. Move the `m_os` member before the `TextDiagnosticPrinter` to avoid a use-after-free. Drive-by: * Also move the `m_output` member which the `m_os` holds a reference to. The fact it's a reference indicates the expectation is most likely that the string outlives the stream. The ASAN macOS bot is currently failing with this: ``` 08:15:39 ================================================================= 08:15:39 ==61103==ERROR: AddressSanitizer: heap-use-after-free on address 0x60600012cf40 at pc 0x00012140d304 bp 0x00016eecc850 sp 0x00016eecc848 08:15:39 READ of size 8 at 0x60600012cf40 thread T0 08:15:39 #0 0x00012140d300 in llvm::formatted_raw_ostream::releaseStream() FormattedStream.h:205 08:15:39 #1 0x00012140d3a4 in llvm::formatted_raw_ostream::~formatted_raw_ostream() FormattedStream.h:145 08:15:39 llvm#2 0x00012604abf8 in clang::TextDiagnostic::~TextDiagnostic() TextDiagnostic.cpp:721 08:15:39 llvm#3 0x00012605dc80 in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:30 08:15:39 llvm#4 0x00012605dd5c in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:27 08:15:39 llvm#5 0x0001231fb210 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 llvm#6 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 llvm#7 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91 08:15:39 llvm#8 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103 08:15:39 llvm#9 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 08:15:39 llvm#10 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 ... 08:15:39 08:15:39 0x60600012cf40 is located 32 bytes inside of 56-byte region [0x60600012cf20,0x60600012cf58) 08:15:39 freed by thread T0 here: 08:15:39 #0 0x0001018abb88 in _ZdlPv+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb88) 08:15:39 #1 0x0001231fb1c0 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 llvm#2 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47 08:15:39 llvm#3 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91 08:15:39 llvm#4 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103 08:15:39 llvm#5 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 08:15:39 llvm#6 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93 ... 08:15:39 08:15:39 previously allocated by thread T0 here: 08:15:39 #0 0x0001018ab760 in _Znwm+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4b760) 08:15:39 #1 0x0001231f8dec in lldb_private::ClangModulesDeclVendor::Create(lldb_private::Target&) ClangModulesDeclVendor.cpp:732 08:15:39 llvm#2 0x00012320af58 in lldb_private::ClangPersistentVariables::GetClangModulesDeclVendor() ClangPersistentVariables.cpp:124 08:15:39 llvm#3 0x0001232111f0 in lldb_private::ClangUserExpression::PrepareForParsing(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, bool) ClangUserExpression.cpp:536 08:15:39 llvm#4 0x000123213790 in lldb_private::ClangUserExpression::Parse(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, lldb_private::ExecutionPolicy, bool, bool) ClangUserExpression.cpp:647 08:15:39 llvm#5 0x00012032b258 in lldb_private::UserExpression::Evaluate(lldb_private::ExecutionContext&, lldb_private::EvaluateExpressionOptions const&, llvm::StringRef, llvm::StringRef, std::__1::shared_ptr<lldb_private::ValueObject>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) UserExpression.cpp:280 08:15:39 llvm#6 0x000120724010 in lldb_private::Target::EvaluateExpression(llvm::StringRef, lldb_private::ExecutionContextScope*, std::__1::shared_ptr<lldb_private::ValueObject>&, lldb_private::EvaluateExpressionOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) Target.cpp:2905 08:15:39 llvm#7 0x00011fc7bde0 in lldb::SBTarget::EvaluateExpression(char const*, lldb::SBExpressionOptions const&) SBTarget.cpp:2305 08:15:39 ==61103==ABORTING ... ```

llvm#168105) …63019)" This reverts commit 92e5608.

arichardson force-pushed the 2025-rvy-initial branch from 5c19299 to faece38 Compare September 30, 2025 19:33

resistor reviewed Oct 1, 2025

View reviewed changes

resistor reviewed Oct 2, 2025

View reviewed changes

arichardson force-pushed the 2025-rvy-initial branch 2 times, most recently from 4c33fc1 to d7af9f1 Compare October 7, 2025 07:19

arichardson and others added 7 commits November 16, 2025 17:45

[DO_NOT_MERGE][RVY] add test wiht broken expansions

e2a8c10

[TableGen] Report a better error when an InstAlias does not use a Reg…

a0011c3

…Class Also fix the missing space in the error message. I notice while changing RISC-V's loads and stores to use RegClassByHwMode and got a non-descriptive error when tablegen was parsing the InstAliases.

arichardson force-pushed the 2025-rvy-initial branch from d7af9f1 to a0011c3 Compare November 17, 2025 01:46

arichardson pushed a commit that referenced this pull request Nov 17, 2025

Revert "[Transform][LoadStoreVectorizer] allow redundant in Chain (#1… (

a407d02

llvm#168105) …63019)" This reverts commit 92e5608.

arichardson changed the base branch from upstream-basline to master November 17, 2025 01:48

arichardson changed the base branch from master to upstream-basline November 17, 2025 01:48

Initial RVY support #1

Are you sure you want to change the base?

Initial RVY support #1

Conversation

arichardson commented Sep 30, 2025

Uh oh!

resistor commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

resistor Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

resistor Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arichardson Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrtc27 commented Oct 9, 2025

Uh oh!

arichardson commented Oct 10, 2025

Uh oh!

arichardson commented Oct 10, 2025

Uh oh!

resistor commented Oct 10, 2025

Uh oh!

jrtc27 commented Oct 10, 2025

Uh oh!

jrtc27 commented Oct 10, 2025

Uh oh!

jrtc27 commented Oct 10, 2025

Uh oh!

resistor commented Oct 10, 2025

Uh oh!

arichardson commented Oct 10, 2025

Uh oh!

resistor commented Oct 10, 2025

Uh oh!

arichardson commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

resistor Oct 1, 2025 •

edited

Loading

resistor Oct 9, 2025 •

edited

Loading

arichardson Oct 9, 2025 •

edited

Loading

arichardson commented Oct 10, 2025 •

edited

Loading